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This  paper  considers  the  use  of  massively  parallel  architectures  to  execute  discrete-event  simulations 
of  what  we  term  “self-initiating*  models.  A  logical  process  in  a  self-initiating  model  schedules  its  own 
state  re-evaluation  times,  independently  of  any  other  logical  process,  and  sends  its  new  state  to  other 
logical  processes  following  the  re-evaluation.  Our  interest  is  in  the  effects  of  that  communication  on 
synchronization.  We  consider  the  performance  of  various  synchronization  protocols  by  deriving  upper 
and  lower  bounds  on  optimal  performance,  upper  bounds  on  Time  Warp's  performance,  and  lower  bounds 
on  the  performance  of  a  new  conservative  protocol.  Our  analysis  of  Time  Warp  includes  the  overhead  costs 
of  state-saving  and  rollback.  The  analysis  points  out  sufficient  conditions  for  the  conservative  protocol  to 
outperform  Time  W'arp.  The  analysis  also  quantifies  the  sensitivity  of  performance  to  message  fan-out, 
lookahead  ability,  and  the  probability  distributions  underlying  the  simulation.  / 
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1  Introduction 


The  problem  of  parallelizing  discrete-event  simulations  has  recently  received  a  great  deal  of  attention.  Parallel 
simulations  are  typically  described  as  a  collection  of  Logical  Processes,  or  LPs.  Each  LP  maintains  its  own 
simulation  clock,  and  communicates  with  other  IPs  using  time-stamped  messages.  We  assume  each  IP 
executes  on  its  own  processor,  as  it  might  on  a  massively  parallel  architecture.  The  state  of  an  LP  at 
simulation  time  i  depends  on  the  contents  of  all  messages  that  should  be  sent  to  it  with  time-stamps  less 
than  t.  There  are  two  primary  ways  in  which  an  IP  re-evaluates  its  state.  One  way  is  epitomized  by  a 
queueing  network  simulation:  a  job  leaving  one  queue  (IP)  causes  the  receiving  queue  to  re-evaluate  its 
state.  This  is  an  example  of  a  message-initiating  model,  because  state  re-evaluations  at  an  IP  are  caused  by 
messages  sent  from  other  IPs.  A  different  method  occurs  when  an  IP  alone  determines  when  to  re-evaluate 
its  state.  The  IP  will  send  messages  to  other  IPs  following  a  re-evaluation,  because  those  IP's  eventual 
re-evaluations  will  require  that  information.  However,  the  state  messages  do  not  cause  the  recipients  to  re¬ 
evaluate  their  state.  The  messages  cause  events  that  serve  only  to  store  the  transmitted  state  information. 
This  paper  concerns  such  models:  we  will  call  them  self-iniiiatinj  models.  As  we  later  discuss,  this  class  of 
models  includes  problems  as  diverse  as  the  Ising  spin  simulation[14],  and  trace-driven  multiprocesso”  cache 
simulations. 

Synchronization  has  been  a  major  concern  of  research  in  parallel  simulation.  One  way  of  ensuring 
correctness  is  to  block  an  IP  from  computing  its  state  at  t  if  there  is  any  chance  that  it  will  later  receive 
a  message  with  time-stamp  s  <  t.  This  type  of  blocking  is  an  open  invitation  to  deadlock:  irregular 
and  unpredictable  synchronization  requirements  make  parallelizing  discrete-event  simulations  a  lion-trivial 
problem.  Early  research  efforts  focussed  on  developing  deadlock-free  synchronization  protocols.  Two  schools 
of  thought  emerged.  The  conservative  school  studied  protocols  that  maintain  consistency  in  the  simulation 
state:  an  IP  is  never  allowed  to  advance  its  clock  so  far  that  it  can  receive  a  message  in  its  past.  IPs 
exploit  specific  information  about  the  simulation  model  to  avoid  or  break  deadlock.  The  optimistic  school 
proposed  Time  Warp,  a  scheme  that  permits  an  LP  to  advance  its  clock  without  blocking.  When  an  IP 
does  receive  a  message  in  its  past,  it  “rolls  back"  its  clock  to  the  point  of  the  temporal  fault,  and  restores  its 
state  to  one  existing  prior  to  the  fault.  Time  Warp  does  not  need  to  use  specific  model  information.  Indeed, 
a  major  attraction  of  Time  Warp  is  its  transparency  to  the  simulation  modeler. 

Before  parallel  machines  were  commonly  available,  the  debate  between  conservative  and  optimistic  camps 
was  largely  philosophical.  Then,  as  performance  studies  were  published,  no  clear  consistently  best  choice 
emerged.  The  earliest  conservative  protocols  of  C’handy  and  Misra  were  shown  to  suffer  from  serious  per¬ 
formance  problems  on  some  queueing  network  simulations  [22],  but  have  recently  hem  shown  to  work  well 
on  road  network  simulations  [17],  Other  conservative  protocols,  notably  [id]  and  [21],  achieved  acceptable 
performance  on  some  problems  by  exploiting  information  about  the  simulation  model  Time  Warp  too  was 


shown  to  achieve  acceptable  performance  on  some  problems  [3,  4],  The  overhead  costs  of  state-saving  and 
rollback  continue  to  be  a  major  drawback  to  all  optimistic  schemes;  hardware  accelerators  for  these  functions 
have  been  proposed  [1,  5], 

"throughout  this  debate,  little  analytic  theory  was  developed  to  predict,  explain,  or  bound  the  perfor¬ 
mance  of  parallel  simulations.  Exceptions  are  the  detailed  analyses  developed  in  [9]  and  [18],  However,  these 
studies  are  limited  to  two  processors,  and  have  not  been  extended.  Theory  for  massively  parallel  simula¬ 
tions  is  now  starting  to  appear.  Wagner  and  Lazowska  derive  an  upper  bound  on  the  speedups  possible 
in  a  queueing  network  simulation  [25].  Studies  of  Time  Warp  tend  to  assume  negligible  state-saving  and 
rollback  costs.  Lin  and  Lazowska  have  shown  that  if 'Lime  Warp  has  no  state-saving  or  rollback  costs,  and 
if  "correct"  computations  are  never  rolled  back,  then  Time  Warp  achieves  optimality  [11],  This  is  intuitive, 
because  Time  Warp  aggressively  searches  for  the  simulation's  critical  path  —  if  it  is  able  to  do  so  without 
cost,  its  performance  must  be  optimal.  Other  analyses  highlight  the  fact  that  Time  Warp  can  "guess  right" 
while  conservative  methods  must  block.  Lipton  and  Mizell  have  shown  that  there  is  a  certain  asymmetry 
between  optimistic  and  conservative  methods:  while  it  is  possible  for  an  optimistic  method  to  arbitrarily 
outperform  a  conservative  method,  the  converse  is  not  true  [12],  Madisetti.  Walrand,  and  Messerschmitt  [16] 
have  developed  a  performance  model  that  aspires  to  estimate  the  rate  at  which  simulation  time  advances 
under  an  optimistic  strategy  such  as  Time  Warp.  They  model  the  behavior  of  the  system  as  a  Markov  chain, 
and  include  the  cost  of  communication  and  of  synchronization.  Their  analysis  is  exact  for  two  processors, 
and  approximate  for  a  general  number  of  processors.  Their  analysis  is  interesting  in  that  it  permits  a  study  of 
different  re-synchronization  schemes.  However,  it  does  not  address  issues  we  attack  directly,  namely,  bounds 
on  optimal  performance  and  sensitivity  to  message-fanout  and  lookahead  ability. 

Analytic  studies  of  conservative  protocols  [15,  20]  are  of  synchronous  protocols— a  significant  departure 
from  the  field’s  roots  in  distributed  systems.  These  studies  have  established  the  important  property  that 
performance  of  the  studied  methods  scales  up  with  increasing  problem  size  and  architecture.  Furthermore, 
the  analysis  in  [20]  demonstrates  that  as  the  problem  size  increases  relative  to  the  architecture,  performance 
under  the  method  converges  to  optimality.  'Lite  rate  of  convergence  depends  very  much  on  the  nature  of  the 
stochastic  processes  driving  the  simulation. 

A  number  of  issues  have  not.  yet  been  directly  addressed  analytically,  and  are  the  focus  of  this  paper. 
Specifically,  we  place  non-trivial  upper  bounds  on  optimal  performance:  we  include  the  overhead  costs  of 
l  ime  Warp  in  a  model  that  bounds  its  performance  from  above;  we  study  a  new  conservative  protocol  and 
place  a  lower  bound  on  its  performance;  we  give  conditions  under  which  the  conservative  protocol  achieves 
(letter  performance  than  Time  Warp.  In  the  course  of  these  derivations  we  quantify  (approximately)  the 
sensitivity  of  performance  to  lookahead  ability,  message  fan-out,  and  the  probability  distributions  driving 
the  simulation  All  of  these  fore-mentioned  factors  are  shown  to  have  significant  influence  on  performance; 
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performance  improves  as  lookahead  ability  improves  or  as  the  variability  of  the  probability  distribution 
decreases,  performance  degrades  as  the  message  fanout  increases.  It  is  important  to  note  that  these  conclu¬ 
sions  are  derived  in  the  context  of  self-initiating  simulation  models  only.  Different  but  related  results  can  be 
derived  in  the  context  of  message-initiating  models1. 

2  Model 

We  model  a  parallel  simulation  as  a  collection  of  .V  logical  processors  [I.P s)  named  LP\ , . . .  ,  LP.\ .  I'ach 
logical  processor  has  its  own  simulation  clock.  LPs  communicate  through  the  exchange  of  time-stamped 
messages.  Viewed  from  the  perspective  of  simulation  time,  an  L P  advances  forward  by  executing  some 
activity  that  we  will  call  a  cycle.  In  the  self-initiating  models  we  consider,  at  the  end  of  one  cycle  the  LP 
schedules  the  end  of  the  next  cycle,  independently  of  any  messages  it  may  have  received  from  other  LPs. 
We  let  C',(j)  denote  the  value  of  L  P,  's  clock  at  the  end  of  the  jth  cycle.  The  length  of  simulation  time  (hat 
/•  /’,  advances  by  executing  its  jth  cycle  is  a  random  number  Xt]  from  a  distribution  T .  Consequently,  for 
every  LP,  and  cycle  j 

] 

c,u)  =  J2Xik- 

t=i 

Assuming  that  the  time  increment  variables  are  all  independent,  C,(j)  can  be  interpreted  as  the  time  of  the 
jth  renewal  in  some  renewal  process  [24]  with  inter-renewal  distribution  T .  We  introduce  communication 
to  the  model  by  assuming  that  each  LP,  associates  a  set  of  K  messages  with  the  completion  of  each  of  its 
cycles.  K  is  called  the  message  fanout.  Typically,  these  I\  messages  are  intended  to  inform  “nearby"  l.  P's 
of  the  new  state  just  computed.  The  arrival  of  such  a  message  at  ,n  l.P  may  cause  an  event,  but  one  that 
serves  only  to  store  the  transmitted  value.  1\  =  2  might  be  appropriate  in  a  ID  domain.  I\  =  1  or  I\  —  £  in 
a  2D  domain,  I\  =  6  or  K  =  26  would  be  appropriate  in  a  3D  domain.  It  is  important  to  note  that  under 
our  formulation  these  message  fanouts  are  part  of  the  simulation  model,  and  hence  are  independent  c  the 
synchronization  protocol  used.  A  message  associated  with  the  completion  of  LPfs  jth  cycle  has  tier -stamp 
('.(})■ 

The  simulation  is  modeled  as  N  statistically  independent,  concurrent  renewal  processes  ; nat  communi¬ 
cate,  Certain  points  in  the  analysis  to  follow  are  made  possible  by  the  assumption  of  statistical  independence 
between  the  recipients  of  a  common  message.  To  support  this  need  for  independence  we  assume  that  the  l\ 
recipients  of  a  message  are  chosen  uniformly  at  random  from  the  set  of  all  LPs,  and  that  each  LP  indepen¬ 
dently  choose*  a  new  set  of  recipients  each  cycle.  This  assumption  does  not  accurately  model  the  behavior 
of  any  common  simulation  model,  and  is  used  purely  to  promote  t ractabdity.  We  have  performed  simula 

1  Prrjornmnrr  Hound*  on  Pnrallfl  Messagr-lintinting  Pj.irrr tr-Evrnt  Sfmulatrrnts.  t).  Niro],  in  prop, ira(  inn 
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tion  studies  of  our  analytic  model  using  ''nearest-neighbor"  communication,  and  have  found  that  processor 
utilizations  are  only  slightly  higher  than  those  achieved  using  the  randomized  communication  patterns  our 
analysis  assumes.  We  may  have  some  confidence  therefore  that  the  conclusions  derived  under  the  assumption 
of  randomized  communication  are  not  completely  off  the  mark. 

We  will  consider  two  different  forms  for  the  probability  distribution  T .  In  one  form  T  has  a  continuous 
cumulative  probability  distribution  function,  implying  that  its  associated  renewal  process  is  non-!attice[2-l]2. 
This  form  excludes  simulation  models  where  time-increments  move  forward  more  discretely.  We  therefore 
also  derive  results  under  the  assumption  that  T  is  a  geometrically  distributed  random  variable,  with  mean 
1  /p.  where  0  <  p  <  1 . 

Depending  on  the  simulation  model,  it  may  be  possible  to  send  the  messages  associated  with  the  comple¬ 
tion  of  a  cycle  before  an  LP  actually  executes  that  cycle.  In  some  cases  the  content  of  the  messages  cannot  be 
predicted,  but  the  time  of  the  messages  can.  In  the  former  case  we  will  say  the  simulation  has  full-lookahead. 
in  the  latter  case  we  say  it  has  time-lookaluad.  An  example  of  a  model  with  time-lookahead  is  thp  Ising  spin 
simulation  [1*1].  LP s  model  individual  particles,  each  of  which  is  "jiggled"  by  thermal  effects,  at  random 
intervals.  When  a  particle  is  jiggled  its  new  magnetic  spin  is  computed  as  a  function  of  the  spins  of  nearby 
atoms  at  that  simulation  time.  The  length  of  simulation  time  between  jigglings  defines  a  cycle.  We  are  able 
to  predict  when  next  a  particle  will  be  jiggled — this  time  comes  from  a  random  number  generator — but  will 
not  know  the  spins  of  nearby  particles  at  that  simulation  time  until  the  simulation  actually  advances  that 
far. 

An  example  of  a  model  with  full-lookahead  (although  it's  a  message  initiating  model)  is  a  queueing 
network  with  a  non-preempt ive  and  load-independent  queueing  discipline.  At  the  time  a  job  enters  service, 
say  s,  we  can  predict  the  time  at  which  it  will  leave  service,  say  t.  In  fact,  we  can  notify  the  recipient  queue 
of  that  job's  arrival  at  time  t.  This  is  not  to  say  that  we  can  actually  simulate  up  to  time  t.  For  example, 
one  of  the  statistics  we  may  be  interested  in  is  the  average  length  of  the  queue  at  the  time  a  job  departs. 
To  measure  the  queue  length  at  t  we  need  to  receive  any  additional  Jobs  that  may  arrive  between  times  s 
and  /.  I  he  lookahead  ability  derives  from  the  fact  that  arrivals  between  $  and  t  in  no  way  affect  the  output 
behavior  of  the  LP  at  time  t. 

Another  example  of  a  model  with  full-lookahead  is  a  simple  trace-driven  multiprocessor  cache  simulation 
that  estimates  hit  statistics,  such  as  that  described  in  [10].  An  LP  models  one  processor's  cache;  cycles 
are  composed  of  the  processing  of  a  contiguous  sequence  of  purely  local  memory  references  terminated  by  a 
reference  to  global  memory,  the  "time"  of  any  reference  is  the  number  of  trace  references  preceding  it3.  An 

2  A  non-negat  ive  random  variable  A  is  non  lat  lire  if  1  here  does  not  exist  anv  real  Til  i  ml  ter  i  Ml,  h  t  hat  ^  n  I’r{  V  —  ed }  —  1 

'note  that  this  property  would  not  he  satisfied  by  a  simulation  that  more  accurately  models  the  advancement  of  time.  e.R.. 

one  that  accounts  more  time  for  a  miss  than  a  lot  A  weaker  form  of  lookahead  exists  where  the  /,/’  ran  put  a  lower  hound  on 
the  time  of  its  next  global  reference  by  assuming  all  local  references  will  he  hits. 
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LP  sends  messages  to  all  other  LPs  whenever  it  makes  a  reference  to  global  memory.  By  looking  at  its  own 
trace  the  LP  can  predict  the  time  and  content  of  its  future  messages.  Throughout  this  paper,  protocols  that 
exploit  full-lookahead  will  do  so  by  requiring  an  LP  to  send  a  message  as  soon  as  it  is  able  to  predict  that 
message.  We  will  still  require  that  an  LP  not  process  a  cycle  with  completion  time  t  until  all  messages  with 
time-stamps  less  than  t  have  been  received,  because  certain  calculations  internal  to  the  LP  (e.g.  statistics 
gathering)  may  require  that  this  monotonicity  be  preserved.  Protocols  that  exploit  time-lookahead  do  so  by 
requiring  an  LP  to  send  an  appointment  message  containing  the  time  of  a  future  message  as  soon  as  it  is 
able  to  predict  that  a  message  will  later  be  sent  with  that  time-stamp.  Our  analysis  will  be  of  simulations 
with  full-lookahead.  We  will  later  remark  on  how  that  analysis  can  be  extended  to  simulations  with  only 
time- lookahead. 

We  assume  that  processing  a  cycle  requires  one  tick  of  real  time.  This  permits  us  to  view  the  progress  of 
the  simulation  synchronously.  While  an  LP  will  read  all  messages  sent  to  it.  at  each  tick,  it  need  not  process 
a  cycle  every  tick:  in  fact,  synchronization  constraints  may  prevent  it  from  doing  so. 

Some  synchronization  protocol  must  be  used  to  ensure  correct  ness.  A  conservative  protocol  prevents  an 
LP  from  advancing  so  far  that  it  can  receive  a  message  with  a  time-stamp  smaller  than  its  clock  value.  Pot- 
example.  imagine  a  situation  where  LP,' s  clock  is  s  and  it  will  increment  its  clock  to  value  t  on  the  next 
cycle  it  processes.  Imagine  that  will  send  a  message  to  LPj  with  time-stamp  e,  $  <  v  <  /,  at  the  end 
of  the  k  +  4 th  tick.  A  conservative  protocol  will  ensure  that  LP,  is  idle  during  ticks  k  +  1  through  k  +  4. 
An  optimistic  protocol  may  permit  LP,  to  advance  its  clock  during  these  ticks,  but  will  then  recognize  a 
temporal  error  upon  receipt  of  the  message  with  time-stamp  r.  and  roll  back.  A  rollback  at  LP,  can  itself 
cause  other  rollbacks  on  other  LP s,  as  false  messages  sent  by  that  LP  are  undone. 

Our  goal  is  not  to  propose  a  model  that  precisely  describes  all  self-initiating  parallel  simulations,  nor 
is  it  to  analyze  the  most  general  possible  class  of  simulation  models.  Self-initiating  models  are  by  no 
means  the  most  common  kind  of  simulations,  and  many  simulations  will  not  have  the  power  of  lookahead 
that  we  analyze.  However,  the  analytic  modeling  of  parallel  simulations  is  an  art  in  its  infancy.  We  are 
simply  trying  to  shed  some  light  on  a  tractable  style  of  analysis  that  produces  reasonable  (and  intuitive) 
results.  Even  so.  despite  the  many  preceding  qualifications  the  proposed  model  bears  a  dose  resemblance  to 
simulations  of  practical  interest  .  In  particular  the  model  accurately  describes  the  behavior  of  the  Ising  spin 
and  multiprocessor  cache  simulations  described  earlier. 

3  Optimal  Performance 

Hurling  non-lrivial  upper  anil  lower  bounds  on  the  performance  one  ran  achieve  in  a  parallel  simulation  is 
an  open  question  We  derive  an  upper  bound  on  the  performance  any  protocol  ran  achieve  under  our  model 


assumptions,  and  derive  lower  bounds  on  the  performance  of  a  new  synchronous  conservative  protocol.  Our 
bound  on  the  performance  of  optimistic  protocols  is  independent  of  the  message-cancellation  strategy  used. 

3.1  Upper  Bounds 

We  will  present  an  analytic  approach  that  provides  upper  bounds  on  optimal  performance  for  a  whole  family 
of  lookahead  capabilities.  Consider  any  cycle  on  any  LP ,  and  assume  that  an  oracle  schedules  the  processing 
of  that  cycle  on  the  earliest  possible  tick  such  that  no  further  messages  will  be  received  by  that  LP  with 
a  time-stamp  less  than  the  time  at  the  end  of  the  cycle.  Under  our  assumptions,  this  scheduling  policy  is 
obviously  optimal.  Our  upper  bounds  assume  the  use  of  this  oracle. 

Different  simulation  models  have  different  lookahead  abilities.  Some  models  have  no  lookahead,  others 
are  able  to  predict  ahead  one  cycle,  some  may  be  able  to  predict  multiple  cycles  into  the  future.  For  example, 
the  ability  of  the  multiprocessor  cache  simulation  to  predict  its  own  future  references  to  global  memory  is 
limited  only  by  the  memory  required  to  store  its  trace.  We  will  categorize  these  abilities  by  the  number  of 
cycles  that  can  he  predicted.  A  simulation  model  will  be  said  to  have  J-cycle  full-lookahead  if  the  time  and 
content  of  output,  messages  associated  with  the  completion  of  cycle  k  can  be  predicted  at  the  completion  of 
cycle  ( k  —  J ).  We  assume  that  simulations  exploiting  J-cycle  full-lookahead  will  always  “pre-send”  a  message 
J  cycles  before  the  message's  associated  cycle. 

The  basic  approach  to  constructing  an  upper  bound  is  simple.  The  Global  Virtual  Time  [7]  at  tick  i 
is  denoted  GVT(i)\  this  quantity  is  the  least  clock  value  among  all  LPs  at.  tick  i,  and  is  typically  used 
to  gauge  the  progress  of  the  simulation.  W7e  desire  to  bound  the  limiting  rate  of  simulation  time  increase, 
lim,— -v  GVT(i)/i.  Our  approach  is  to  bound  GVT(i)  by  a  function  N(i),  which  is  the  minimum  time-stamp 
among  the  "next"  messages  sent  by  LPs  who  received  the  minimum-time  stamped  message  at  tick  i  —  1. 
We  then  appeal  to  asymptotic  arguments  to  estimate  lim!„(X)  N(i)/i,  and  hence  bound  the  limiting  rate  of 
simulation  time  increase. 

T  he  bound  is  constructed  as  follows.  Let  /mm(i)  be  the  least  time-stamp  among  all  messages  sent  at  tick 
i.  This  value  need  not  he  equal  to  GVT(i),  because  the  LP  with  least  clock  may  not  have  sent  a  message, 
being  prevented  from  doing  so  by  the  knowledge  that  an  impending  message  will  arrive  with  time-stamp 
less  than  its  next  cycle  time.  Let  r(i,j)  he  the  index  of  the  jth  LP  (among  K)  who  receives  the  message 
with  time-stamp  fmm(f)  at  tick  i.  Consider  any  LPr{,  7),  and  suppose  time  falls  within  the  simulation 

time  span  encompassed  by  its  iijlh  cycle.  The  next  message  LFr(,  ;|  sends  cannot  have  a  time-stamp  larger 
than  Cr(,  .,( rij  +  J).  the  time  associated  with  the  end  of  its  ( rij  +  J)th  cycle.  The  gap  of  simulation  time 
bet  ween  i )  and  Cr(IJ)(  iij  +  J )  is  composed  of  t  lie  sum  of  a  number  of  random  variables:  a  cycle  residual 
Cr(,  }){»j  )  -  / in m ( / ) .  plus  J  cycle  time  random  variables  (  a  ./-fold  convolution  of  T).  This  is  illustrated  by 
Figure  I 
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Figure  1:  Cycle  residual  and  lookahead  cycles  for  LPr[i in  model  with  J  —  4  cycle  full-lookahead 


/min(j  +  1)  cannot  be  larger  than  the  least  time-stamp  on  any  message  sent  at  the  end  of  tick  i  +  1  by 
one  of  LPr(tii LPr(,,A>  Call  this  latter  time-stamp  N(i  +  1).  Observe  that  we  may  write 

N(i  +  1)  =  tm\n(i)  +  Mk,j(<  +  1), 


where 

MK,j(i+  1)=  min  {Cr«j)(nj )  -  <min(0  + 

1  <j  <  K 

Tj(J )  being  a  J-fold  convolution  of  T  random  variables.  Finally,  observe  that  A'(f  +  1)  >  GVT{i+  1).  Since 
our  object  is  to  bound  lim,_~co  GVT(i)/i ,  it  will  suffice  to  bound  linii_.ro  X(i)/i. 

Observe  that,  for  all  /  >  0, 


» 

-V(i)/i  =  ^(.V(i)-  XU-  D)/i 

j=i 

i 

=  ^(Uj-  D+  A/K-  j(j)  ~  X(j-  I  ))  // 

J  =  1 


l 

<  1)  +  Mk  j(j)  -  A'(j  -  l))/i 

j  =  i 

I 

=  ^  Mk.jU)/>  •  1 ' 

;  =  i 


Tiie  appendix  gives  heuristic  reasons  for  expert  mg  that  if  Fs  tails  aren't  too  large,  (eg.,  if  T  is  NMFF.  nr  has 
an  increasing  hazard  rate  function),  then  it  is  reasonable  to  assume  that  the  sequence  {.l//v  jO)}  ouiveiges 


into  a  wide-sense  stationary  process  having  finite  correlation  time  [8],  While  the  supposition  is  technical,  tor 
our  purposes  here  it  implies  that  the  limiting  value  of  sum  (1)  converges  to  ^(A.  J)  —  lim,_^_  Ld{M  kj[i)] 
'i(K.J)  then  bounds  the  limiting  rate  of  increase  in  GVT. 

The  preceding  discussion  leads  to  our  first  proposition. 

Proposition  1  For  every  lick  i  lei  tmln(i)  be  the  least  time-stamp  among  all  messages  sent  at  the  end  of 

tick  i.  and  let  LP,ud . L Pr( t  a  t  be  the  set  of  LP  who  receive  the  tmin(i)-tirne  message.  Let  iij  In  the 

cycle  index  of  the  LP,.Uj]  cycle  containing  time  t,mn(i).  and  Tj[J)  be  a  convolution  of  J  random  variables 
hating  distribution  T .  Define 

1)  =  n'in  {C r( ,  j )( /tj )  —  f min( ? )  d" ■ 

and  let  •f'fA'.V)  =  lim*_cc  A'[-^K  •/(')]•  If  the  sequence  Mk,j(1)<Mkj(2)....  converges  to  a  inde-sense 
stationary  process  with  finite  correlation  time,  then 

lim  GVT(i)/i  <  'k(K.d). 


A  second  proposition  follows  from  the  observation  that  a  simulation  advancing  time  at  rate  7//  has  an  average 
processor  utilization  of  7%. 

Proposition  2  Let  the  conditions  of  Proposition  1  be  satisfied,  and  lei  //  be  the  mean  of  T .  Then  the  average 
processor  utilization  is  no  greater  than  '}/(K .  J)/n- 


We  must  estimate  ty(A \J)  before  these  propositions  yield  any  insight,  on  performance.  Reconsider  the 
definition  of  Mie.j(i).  One  takes  the  minimum  of  A  random  variables;  each  random  variable  includes  the 
excess  time  after  /mm(t)  of  the  cycle  containing  'm,n(f).  We  call  this  time  difference  a  residual  A  similar 
concept  is  studied  in  renewal  theory,  the  residual  life  of  a  renewal  processes.  The  difference  between  our 
residuals  and  those  of  renewal  theory  is  that  our  fmln(i)  is  itself  variable,  whereas  renewal  theory  considers  the 
residual  following  a  constant  time  t.  However,  in  the  Appendix  we  show  that  if  T  is  non-lattice  and  fmin(t) 
is  independent  of  L.Pr{l ;),  then  the  limiting  residual  life  has  the  same  distribution  as  that  derived  in  renewal 
theory.  This  limiting  distribution  is  the  equilibrium  distributional ]  of  IF.  called  Tr.  It  is  not  completely 
unreasonable  to  to  assume  for  the  purposes  of  approximation  t hat  fniln(')  is  independent  of  ■!’  the 
due  to  the  fart  that  the  set  of  recipients  of  the  fmin(f)-time  message  were  chosen  uniformly  at  random  from 
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the  entire  collection  of  IPs.  To  the  extent  that  this  is  a  reasonable  approximation,  as  k  grows,  Mk  j(k) 
increasingly  becomes  the  minimum  of  K  independent  and  identically  distributed  (iid)  random  variables,  each 
the  sum  of  an  Fe  random  variable  and  an  independent  J-fold  convolution  of  F .  Alternately,  if  F  is  geometric 
then  its  residual  life  has  the  same  geometric  distribution,  due  to  the  memory  less  property. 

Our  assumption  of  random  communication  patterns  is  now  needed  again.  When  the  number  of  IPs  is 
large  compared  to  A',  and  when  the  partners  of  each  communication  are  chosen  randomly,  we  may  take 
the  1\  random  variables  comprising  Mpj(k)  as  being  independent.  This  is  not  rigorously  true,  as  there  is 
a  very  slight  dependence  of  the  residuals  on  the  fact  that  their  IP's  did  not  send  the  fnlin-time  message: 
since  this  is  true  of  all  but  one  IP  in  the  entire  system,  the  assumpt  ion  of  independence  is  reasonable.  Our 
approximation  of  J)  is  denoted  \(K,J ),  and  is  given  by 

'l'(A'.J)  «  Pfruinf A'  iid  Fe  +  F(J)  random  variables}]  (2) 

=  A(A\  J). 

Throughout  this  paper  we  will  implicitly  assume  the  validity  of  this  approximation  as  a  hypothesis  to 
each  proposition.  The  form  of  the  approximation  is  especially  nice,  as  it  permits  us  to  analyze  some  special 
cases. 


3.1.1  NBUE  Distributions 

The  case  of  J  =  0  is  of  special  interest,  as  it  concerns  simulations  with  no  lookahead  ability.  Furthermore, 
consider  simulations  where  F  is  non-lattice  and  New  Beiter  Than  Used  in  Expectation,  (NBUE)  [2l]4  Many 
common  distributions  are  NBUE,  including  normals  truncated  to  be  positive,  gammas,  Weibulls,  and  sums 
of  nonnegative  constants  with  exponentials.  When  F  is  NBUE,  then  Fe  is  dominated  stochastically*’  by  the 
exponential  with  mean  p.  Thus,  if  we  replace  each  Fe  random  variable  in  the  definition  of  A(A.O)  with  an 
exponential  having  mean  p,  the  resulting  mean  p/K  is  at  least  as  large  as  A(A'.  J). 

Proposition  3  Suppose  (hat  F  is  non-lattice  and  NBUE.  Then  A(A’.O)  <  p/K  .  The  optimal  processor 
utilization  in  a  no-lookahead  simulation  where  F  is  NBUE  is  no  greater  than  1/A’. 


This  result  shows  the  strong  influence  that  A  lias  on  performance  when  F  is  non-lattice— it  limits  processor 
utilization  to  1/A'.  If  K  remains  proportional  to  ,V  as  N  increases  we  have  the  following  result. 

4  A  non-negative  random  variable  X  is  NBUE  if  for  all  t  >  0,  /'. [ .V I . V  >  (]  <  A(.\].  In  other  words,  the  expected  residual  life 
of  X  is  never  greater  than  the  expected  value  of  X 

'  V  is  said  to  dominate  Y  stochastically  if  Pr{.Y  >  t)  >  l’rfV  >  <}  for  all  I. 
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Proposition  4  Let  J7  be  non-lalticc  and  XHl'E,  and  suppose  A  >  AX  for  all  X  in  a  no-lookaluad  simu¬ 
lation  linn  unth  r  any  protocol  and  Jur  any  X.  tin  average  nunibrr  of  total  cycles  processed  by  tin  sysh  m 
pi  r  tick  ran  not  i  j  ret  <1  1  /  A . 


A  equals  .V  in  llie  <*aolu*  simulation  we  have  already  described,  because  one  LA's  global  reference  is  sent  to 
al1  oilier  Li’s.  The  conclusions  of  Proposition  1  apply  whenever  a  synchronization  protocol  treats  the  model 
as  having  no  lookaln  ad. 

3.1.2  Geometric  Distribution 

Propositions  3  and  1  depend  on  T  being  non-lattice.  When  T  is  geometrically  distributed  with  mean 
//  =  1/p.  then  the  residuals  delining  A(A’.O)  are  also  geometric  with  m<‘an  1/p.  It  is  straightforward  to 
compute  A(A’.O)  a.s  the  expected  minimum  of  A  independent  geometries.  This  minimum  has  the  same 
distribution  as  one  geometric  with  mean  1/(1  —  (1  —  p)k  ).  Observe  that  this  mean  is  always  at  least  one.  it 
cannot  diminish  arbitrarily  as  A’  is  increased.  When  either  p  is  small  or  l\  is  large  the  mean  is  very  close  to 
one.  leading  us  to  the  next  proposition. 

Proposition  5  Suppose  that  T  is  geometric  with  mean  ft  =  1/p.  Then 

and  processor  utilization  is  no  grr  .ter  than  p/(  1  —  (1  —  p)h  ).  Thus,  as  (1  —  p)1'  —  0.  processor  utilization 
is  no  greater  than  p. 


3.1.3  Constant  plus  Exponential  Distributions 

Larger  upper  bounds  on  utilization  are  possible  given  better  lookahead  ability.  Suppose  that  a  simulation 
has  one-cycle  full- lookahead  ability,  and  consider  the  family  of  distributions  where  a  constant  h  is  added  to 
an  exponential  with  mean  Within  this  family  wo  can  decrease  the  variability  by  increasing  but  still 
retain  the  tr.vtubility  of  the  exponential.  This  family  of  random  variables  is  NBI  K.  In  the  Appendix  we 
show  that  under  these  assumptions 

■Ol,.l)<l*lJ  a - 5; -  "1. 

The  interesting  thing  to  note  about  tliis  bound  is  that  it  decreases  m  \  j  \/l\  rathej  than  in  I/A.  as  in 
l  lie  no  lookhead  case.  This  suggests  that  really  significant  performance  gains  may  be  possible  win  u  A  is 


in'"!'  i .  1 1 1  1  \  large.  Iiy  exploit  mg  one-cycle  lull-lookaln  ad.  However,  (lie  far!  that  we  have  increased  an  u  |  >  |  ■<  i 
I  ••■mi. I  .>n  performance  «loi-s  not  necessarily  imply  that  performance  itself  must  increase.  In  the  loll,  •winy, 
1 1< on  we  address  this  issue  by  deriving  a  lower  bound  on  optimal  performance  under  tin-  assumption  "I 
lull  lookahead  lor  ./  >  1  cycles. 

.'1.2  Lower  Bounds 

We  ii"W  derive  a  lower  hound  on  optimal  performance  for  simulation  models  with  full-lookaheads  of  ./  >  1 

•  ydes  Our  approach  is  to  view  synchronization  as  a  scheduling  problem,  and  derive  the  performance  one 
e  hit  ns  using  a  particular  scheduling  strategy.  Being  sub-optimal,  this  performance  provides  a  lower  hound 

•  'U  optimal  performance.  The  strategy  we  study  forms  the  basis  for  a  conservative  synchronization  protocol. 

<  "iisidrr  a  simulation  model  with  7-cycle  full-lookahead.  Recall  that  we  exploit  ./-cycle  full-lookahead 
:■>  requiring  an  l.P  who  completes  cycle  in  to  predict  and  send  the  message  associated  with  the  end  of  cycle 
•  1  'suppose  last  executed  cycle  in,  and  knows  {through  some  as  yet  unspecified  means)  that  it  will 
ii"t  receive  any  further  messages  with  a  time-stamp  t  or  smaller.  I  falling  within  its  (w  +  k)th  cycle.  LP, 

1 1 •  i>  safely  compute  cycles  in  +•  1  through  in  +  k  —  1,  and  in  doing  so  predict  the  messages  associated  with 
ile  ■  "inplit  mu  of  its  cycles  in  +1+7  through  ni  +  k  +  7  —  1.  The  idea  behind  our  scheduling  strategy  is 
("  defile-  a  iini'h'ir  by  defining  this  I.  All  LP s  may  then  advance  as  described  above,  whereupon  we  define 
i  ie  u  /  and  hence  a  new  window.  Our  strategy  defines  and  processes  each  window  with  ( he  following  steps. 

I  I'<  >r  each  l.P  determine  the  time  of  the  next  message  it  will  send.  If  LP,  last  evaluated  cycle  in.  t  lie 
dine  of  the  next  message  it  will  send  is  C,[m  +  ,/  +  1).  Compute  the  minimum  such  among  all 
/./’->  e.,,,,,  is  called  the  n  iIiiii/. 

'1  I. arh  l.P  computes  all  its  cycles  with  termination  times  strictly  less  than  cmln.  For  each  cycle  n  that 
is  so  processed,  the  LP  predicts  and  sends  the  message  associated  with  the  completion  of  cycle  n  +  7. 

II  Fa'  ll  l.P  accepts  the  messages  sent  in  (lie  previous  step. 

I  hi'  process  is  repeated  until  the  simulation  termination  condition  is  reached. 

1  In-  performance  of  this  mechanism  is  derived  as  follows.  Let  r, „„,(/)  he  the  ceiling  computed  during  l  In- 
jth  window  I'lie  asymptotic  rate  at  which  simulation  time  advances  is  identical  to  the  asymptotic  rate  at 
vlii'h  e, advances.  Consider  the  jth  window:  every  l.P,  computes  all  as-yet -unprocessed  cycles  with 
completion  times  less  than  r, Let  in,  he  the  last  cycle  so  processed  by  l.P,.  Note  that  by  the  end  of 
Mu  window.  l.P,  will  have  sent  messages  associated  with  cycles  up  to  in,  +  7.  Idle  time  ol  the  next  tnessag' 

/  /',  will  send  rail  be  expressed  as  the  sum  of  clnl  „  (/ ) .  I  lie  residual  of  the  cycle  containing  eIIHI1(_;l.  and  ill' 
V'l>  time  increments  of  the  following  7  cycles  I  lien-fore,  the  difference  between  the-  I  mi'  ol  /  s  next 


message  aiul  cnn„(j)  is  composed  of  a  cycle  residual  plus  a  J-fold  convolution  of  cycle  time  increments.  We 
have  already  seen  in  §3.1  that  as  j  grows  large  this  difference  may  be  approximated  as  a  random  variable 
having  the  distribution  of  T,  +f(J).  Then  cmin(j  +  1)  can  be  expressed  as  cmm(j)  plus  the  minimum  of  .V 
such  random  variables  This  shows  that 

E[Cmin(j  +  1)  -  bnin (j)]  ~  'if(X.J)  as  J  —  OC 

No  l.P  can  process  more  than  J  cycles  in  a  window,  because  it  can  never  advance  beyond  the  time  of 
the  next  message  it  will  send  ( J  cycles  distant),  computed  in  step  1  of  the  window  processing.  'F(.Y.7)/J 
consequently  bounds  the  limiting  rate  at  which  simulation  time  advances  from  below. 

Proposition  G  If  the  suppositions  of  Proposition  1  are  met,  then  the  limiting  rate  of  simulation  time  advance 
using  lookahead  scheduling  on  a  J -cycle  full-lookahead  simulation  model  is  at  least  ^(X.  J)/J.  The  limiting 
processor  utilization  is  at  least  'f'f.V.  J)/(  J/r). 


Now  consider  the  special  case  of  J  =  1.  and  the  6  +  exp{//r}  distribution.  In  the  Appendix  we  show  that 


(4) 


implying  that  p ,  the  average  processor  utilization  achieved,  is  at  least 

h  +  t'r  \/ 


P  > 


f  /  * 

=  - N  where  r  =  b/iiT. 

r  +  1 

This  shows  that  the  extreme  conservatism  of  having  every  LP  block  on  the  cm„,-t.ime  messages  can  be  highly 
tempered.  If  r  =  b/pT  is  at  all  significant  the  utilizations  are  quite  good.  For  example,  if  r  =  0.25  we  still 
get  at  least  20f/<  utilization.  Increase  r  to  1  and  we  are  assured  of  50'X  utilization,  r  =  10  delivers  9 19c 
utilization. 

The  rase  where  T  is  geometric  is  also  of  interest.  \{X,J)  is  composed  of  the  minimum  of  X  random 
variables,  each  the  sum  of  J  geometric  random  variables.  Each  geometric  is  at  least  as  large  as  one.  implying 
that  A(.Y ../)  >  ./.  If  the  geometries  have  mean  p  =  1/p.  Proposition  G  shows  that  average  processor 


utilization  is  at  least  100  *  jb’A 


4  Analysis  of  Optimistic  Protocol 

We  next  turn  to  a  similar  analysis  of  optimistic  protocols.  These  protocols  are  complex,  especially  with 
regard  to  the  effects  of  cascading  rollbacks.  It  appears  to  be  a  formidable  task  to  put  a  loin  r  bound  on 
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the  rate  at  which  an  optimistic  protocol  advances  simulation  time.  However,  it  is  easy  to  extend  the  ideas 
of  the  previous  section  to  put  an  upper  bound  on  this  rate.  Indeed,  our  ideas  of  focusing  the  analysis  on 
the  costs  along  the  critical  path  are  mirrored  in  Lipton  and  Mizell’s  proof  that  conservative  methods  cannot 
arbitrarily  outperform  Time  Warp. 

Successful  conservative  schemes  exploit  simulation  model  characteristics  such  as  non-preemptive  queue¬ 
ing.  and  precalculation  of  event  duration  times.  One  of  the  great  hopes  for  optimistic  protocols  is  that 
they  can  be  implemented  without  using  explicit  knowledge  about  the  simulation  model.  Consequently,  a 
"model-independent”  implementation  cannot  assume  any  lookahead.  The  results  of  the  previous  section 
show  that  this  assumption  immediately  limits  the  processor  utilizations  that  are  possible.  Naturally,  the 
cost  of  state-saving  and  rollback  limits  utilizations  even  more.  To  be  sure,  simulations  using  Time  Warp  may 
exploit  model  information;  indeed,  users  who  have  and  use  Time  Warp  implementations  have  suggested  thev 
should  do  so  [2].  Note  however  that  these  types  of  optimizations  do  not  alleviate  the  burden  of  state-saving, 
l  lie  arguments  to  follow  assume  fhat  Time  Warp  treats  the  simulation  model  as  though  it  has  no  lookahead. 

I'nder  lime  Warp  an  LP  rolls  back  if  it  receives  a  message  with  a  time-stamp  less  than  its  clock.  We 
assume  that  an  LP' s  state  is  always  saved  prior  to  the  execution  of  a  cycle,  and  model  that  cost  with  ticks 
of  length  C's  >  1.  as  compared  to  the  earlier  ticks  of  length  1.  We  suppose  that  a  rollback  requires  C'r  time, 
measured  in  units  where  processing  an  event  (without  state-saving)  takes  unit  time.  The  model  bounds 
the  rate  of  simulation  time  increase  on  the  critical  path.  The  only  assumption  used  by  the  analysis  is  that 
a  "late"  message  causes  a  rollback  at  the  receiving  processor,  any  effects  due  to  rollback  propagation  arc 
ignored.  A  rolled-back  processor  will  re-execute  cycles  it  was  rolled  past.  For  this  reason  our  analysis  is 
independent  of  whether  "aggressive”  or  "lazy”  message  cancellation[23]  is  used.  By  assuming  that  cycles 
passed  by  the  rollback  are  reevaluated,  the  analysis  assumes  that  “jump  forward"  mechanisms  [(>]  are  not 
used.  Under  an  optimistic  protocol  a  given  cycle  of  an  LP  may  be  processed  a  number  of  times  before  it  is 
“cast".  To  avoid  complications  we  assume  that  the  length  in  simulation  time  of  a  cycle  is  the  same,  every 
time  the  cycle  is  processed. 

Suppose  that  a  rollback  is  initiated  at  LP,  at  tick  k.  Barring  any  further  interruptions  due  to  cascading 
anti-messages,  the  rollback  completes  at  tick  k  +  C'n.  At  tick  k  +  C'r  +  1  the  LP  rejoins  the  simulation  and 
communicates  its  K  messages.  Ihe  idea  behind  the  analysis  of  optimal  performance  in  §3  was  to  bound  the 
advance  in  simulation  time  between  two  ticks  by  looking  at  the  LPs  which  receive  the  message  with  least 
time-stamp  at  a  tick.  Kxactly  the  same  idea  applies  here,  except  that  the  increase  in  simulation  time  must 
be  measured  over  more  than  one  tick. 

Consider  the  set  of  LPs  who  receive  t  lie  message  with  time-stamp  Let  LPmj„  be  the  LP  in  this 

set  with  the  minimum  cycle  residual  (measured  from  /mln(t)).  and  recall  that  .V (i)  is  the  tick  at  which 
sends  its  next,  message  We  will  use  these  definitions  to  part  it  ion  the  running  time  into  phases  that  measure 


13 


th<>  time  between  an  LP' s  receipt  of  a  /niln-time  message,  and  the  next  tick  at  which  the  LP  completes 
;i  c vide  and  sends  a  message.  Die  idea  of  a  phase  is  to  measure  the  rollback  delay  caused  by  receipt  of 
the  ,,-time  message.  Phase  1  encompasses  ticks  1  through  A'(l)  —  1.  Phase  2  encompasses  ticks  A(l) 
through  .V(.V(D)  —  P  phase  encompasses  ticks  V(.V(1))  through  .V(.V(.V(1)))  —  1,  and  so  on.  When  T 
is  non-lattice,  any  LP  receiving  the  /mln(/)-time  message  at  the  end  of  tick  i  must  roll  back,  or  already  be 
rolling  hack.  If  it  is  already  rolling  back  and  if  the  transmission  of  the  /„,ln-time  message  to  that  LP  is 
independent  of  thi'  fact  it  is  already  rolling  back,  then  on  average  the  rollback  is  half-way  completed.  In 
this  case  the  mean  number  of  ticks  in  a  phase  must  be  at  least  1  +  CR/ 2.  This  argument  requires  T  to  be 
non-lattice,  for  when  T  is  discrete  it  is  possible  for  an  LP  receiving  the  least  time-stamp  message  to  have 
the  same  clock  value  as  the  time-stamp,  possibly  making  a  rollback  unnecessary. 

bet  P(i)  be  the  number  of  phases  that  have  completed  by  tick  i.  r,  be  the  tick  that  completes  the  phase 
containing  i.  and  let  t}  be  the  tick  that  completes  the  jtli  phase.  Then 


( 1 1  I  ( i) / i  <( i\  I  ( c'r ) / i  5; 

< 


< 


e;=v  uj 


We  have  already  identified  conditions  under  which  the  leftmost  quotient  converges  to  A(/\’,0).  Under  similar 
conditions  on  the  length  of  phases  the  rightmost  quotient  will  converge  to  the  reciprocal  of  the  mean  number 
of  ticks  per  phase.  Recall  that  each  tick  is  a  factor  of  C's  slower  due  to  the  cost  of  state-saving.  This 
argument  proves  our  next  proposition. 


Proposition  7  Suppose  the  sequences  {MR.j{i)}  and  {b  +  1-t,}  conccrge  to  respective  widc-sense  stationary 
pnnt  -xs  with  finite  correlation  tunc.  Let  CRh  =  CR/‘l.  If  Tune  Warp  treats  a  simulation  model  as  though 
it  has  no  lookahead  then 


lim  (!\"T(i)/i  < 


’tn  k  o) 

<,.«t,'/*k  +  n 

<K  K .  0) 


if  J-  is  non-lattice 
if  T  is  discrete 


<  oust  gut  ntly.  the  processor  utilization  is  no  greater  than  )l(-' cUr  A+Ti  ^  ls  non-lattice,  and  is  no  greater 


Ilian  1,1  irhin  T  is  discrete. 


An  easy  upper  bound  can  be  put  on  Time  Warp  even  when  it  exploits  lookahead,  because  every  l.P 
incessantly  saves  state.  Each  tick  some  l.P  executes  a  cycle,  and  first  saves  state,  so  that  every  processing 
tick  is  delayed  by  a  state-save.  Therefore,  Time  Warp's  performance  cannot  be  any  better  than  a  factor  of 
C's  worse  than  optimal. 

Proposition  8  The  optimal  rale  of  simulation  time  advance  under  Time  Warp  on  a  simulation  model  hating 
J -cycle  full-lookahead  is  no  greater  than  J-  ;  average  processor  utilization  is  no  greater  than  /•’ , 


5  A  Conservative  Protocol 

The  promise  of  (sometimes)  good  performance  achieved  by  the  scheduling  strategy  described  in  §3  2  sug¬ 
gests  its  use  as  the  basis  for  a  synchronization  protocol.  I’nlike  many  conservative  protocols,  this  one  is 
synchronous,  in  that  the  computation  of  the  ceiling  value  implicitly  contains  a  global  synchronization  among 
processors.  This  synchronization  is  all  that  is  needed  to  implement  the  policy.  The  lower  bound  on  perfor¬ 
mance  we  derived  in  §3.2  must  change  to  accommodate  the  cost  of  computing  the  ceiling.  Define  ( V;  >  1 
so  that  a  processor  is  engaged  in  synchronization  overhead  100(1  —  1  /CoY/e  of  the  time.  Equivalently,  one 
can  view  the  ticks  as  being  Co  percent  of  the  length  of  our  earlier  ticks,  due  to  the  overhead  of  synchroniza¬ 
tion.  Depending  on  the  granularity  of  the  event  computation  the  delay  cost  of  synchronization  can  be  quite 
small,  as  most  richly  connected  architectures  such  as  a  hypercube  can  compute  a  global  minimum  in  log  ,Y 
steps.  Some  architectures  such  as  the  second  generation  Connection  Machine  already  have  hardware  support 
for  common  global  reductions  like  the  minimum.  Including  this  synchronization  cost,  the  lower  bound  on 
processor  utilizations  becomes  4 f(X ,  J)/(CoJ ft)- 

Consider  again  a  simulation  with  1-cycle  full-lookahead  with  <l'-l-exp{pJ-}  time  increments.  Using  approx¬ 
imation  (3)  and  inequalities  (3)  and  (4)  we  can  put  a  lower  bound  on  the  ratio  of  the  conservative  protocol's 
utilization  to  optimal  utilization.  Table  1  plots  this  bound  as  a  function  of  6  and  K .  for  fixed  =  1.  Co  —  2 
and  X  =  (>r>53(U 

Relatively  good  performance  is  possible  when  h  is  non-trivial  relative  to  iiT  and/or  when  l\  is  large,  even 
though  synchronization  overheads  are  509?.  However,  if  the  upper  hound  on  optimal  performance  is  at  all 
light  there  is  clearly  room  for  significant  improvement.  It  is  here  that  the  extreme  conservativeness  of  having 
every  I. P  wait  for  the  least-time  future  message  hurts.  It  may  be  that  more  complex  protocols  such  as  the 
bounded-lag  protocol  [13]  could  significantly  boost  performance  in  this  region. 

We  can  determine  situations  where  this  conservative  protocol  achieves  better  performance  than  Time 
Warp  Assume  the  validity  of  approximation  (3),  and  assume  that  T  has  the  C  +  exp{//,  )  distribution 
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Tattle  1 :  Approximated  lower  bound  on  fraction  of  opt  imal  performance  achieved  by  the  conservative  protocol 
when  /ix  =  1 .  Cq  =  2  and  .V  =  (3553(3. 


This  distribution  is  X BUT,  whence  by  Propositions  3  and  7,  1  /(C's(CRh  +  1 ) A’ )  is  an  upper  bound  on  'Time 
Warp's  utilization.  From  this.  Proposition  6,  and  equation  (-1)  we  determine  that  the  conservative  protocol 
achieves  better  performance  than  Time  Warp  whenever 

Cq  <  +  Pry/jfT) 

Cs(C  Rh  +  1 )  ~  f>  +  Hi 

Note  that  this  inequality  assumes  that  Time  Warp  is  not  exploiting  lookahead. 

Estimates  for  individual  process  state  sizes  in  the  near  term  at  the  Jet  Propulsion  Lab  are  from  IK  up 
to  1M  bytes  [1],  For  4K  state  sizes,  it  is  estimated  that  90%  of  a  processor's  time  could  be  devoted  to 
saving  state.  Using  Time  Warp  on  these  production  problems  without  the  benefit  of  hardware  accelerators, 

C's  =  10  is  apparently  a  reasonable  value. 

Physical  processes  modeled  by  I,P s  very  rarely  have  zero  duration  times.  Many  modeled  processes  exhibit 
a  fixed  startup  cost  .  e  g.  chocking  a  bit  into  a  drill  in  a  manufacturing  simulation.  Therefore  non-zero  values 
of  A  seem  reasonable  in  practice.  Relatively  large  values  K  are  also  common,  especially  in  domain  oriented 
simulations  where  domain  sectors  are  LP s  that  communicate  in  a  nearest  neighbor  pattern. 

Table  2  plots  the  ratio  of  the  lower  bound  on  the  conservative  method's  utilization  to  the  upper  bound  < 

on  l  ime  Warp  utilization,  as  a  function  of  <*>  and  I\  for  fixed  pr  =  1,  N  =  65536,  C’g  =  2,  and  C’i{fl  =  0 
One  set  of  data  assumes  that  C's  ■  10.  Another  assumes  that  C's  =  2,  making  state-saving  comparable  to 

the  cost  of  a  global  synchronization. 

The  performance  difference  is  not  so  great  when  T  is  geometric.  Using  Propositions  5  and  7  we  bound 
Time  Warp’s  utilization  from  above  by  p/( ( '.s(  1  —  (1  —  p)h).  A  simple  lower  bound  on  the  utilization  of 
the  conservative  protocol  is  p/Cq.  Table  3  plots  the  ratio  of  these  bounds  for  the  same  set  of  parameter 
values  as  did  Table  2.  The  values  of  p  are  chosen  to  yield  the  same  mean  values  of  T  as  those  in  Table  2 
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Table  2:  Comparison  of  conservative  protocol  and  Time  Warp  when  T  has  the  t'  +  exp{//x}  distribution. 
ftr  =  1,  .V  =  6553b,  Cm,  =  0.  High  state-saving  costs  modeled  with  ( =  10.  low  state-saving  costs  with 
f's  =  2. 


Despite  the  better  showing  by  Time  Warp,  in  most  of  the  cases  shown  the  conservative  method  compares 
favorably  with  Time  Warp.  The  insensitivity  of  the  conservative  method  to  fanout  is  a  direct  consequence 
of  its  implicit  assumption  assumption  that  the  next  message  to  an  LP  can  come  from  anywhere.  This  is 
i  equivalent  to  assuming  a  fanout  of  N . 

Simulation  studies  suggest  that  our  upper  bound  on  Time  Warp's  performance  is  somewhat  larger  than 
the  observed  performance.  Figure  2  illustrates  the  point  by  plotting  the  measured  (simulated)  performance  of 
Time  Warp  and  our  conservative  method  on  the  analytic  model.  Comparable  overheads  are  used  (C's  =  C, ,  = 
2).  the  conservative  method  exploits  a  1-cycle  full-lookahead  model  while  Time  Warp  does  not.  Aggressive 
cancellation  is  used  in  the  Time  Warp  simulation. 
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Table  3:  Comparison  of  conservative  protocol  and  d  ime  Warp  when  T  has  geometric  distribution.  —  1. 
,V  =  65536,  C'nh  —  0,  High  state-saving  costs  modeled  with  =  10,  low  state-saving  costs  with  Cs  —  2. 


6  General  Application 

The  conservative  protocol  described  earlier  has  broader  application  than  just  to  our  simple  analytic  model. 
Its  principles  form  the  basis  of  a  parallel  simulation  testbed  we  have  implemented  on  an  Intel  i|'.SC/2  [10] 
The  key  idea  to  making  efficient  use  of  such  a  coarse-grained  machine  is  aggregating  large  numbers  of  LP<  for 
evaluation  on  each  processor  One  advantage  to  aggregation  is  that  a  model  which  suffers  very  low  processor 
utilizations  when  eacli  LP  has  its  own  processor  can  achieve  good  processor  utilizations  on  a  coarse  grained 
machine.  For  example,  consider  a  model  with  65536  LP s,  which  gets  1%  utilization  on  an  architecture  with 
65536  processors.  Evaluate  that  model  on  a  machine  with  64  (uonssors.  and  on  average  each  processor 
will  have  more  than  10  events  to  process  each  synchronization  window.  Indeed,  the  results  developed  in  tin 
framework  of  a  more  complex  stochastic  model  show  that  given  1-cycle  full-lookahead,  performance  of  (un¬ 
met  hod  approaches  optimality  as  the  size  of  the  problem  is  increased  relative  to  the  architecture  [20]  These 
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Figure  2:  Empirical  comparison  of  Time  Warp  with  aggressive  cancellation  and  conservative  protocol.  Con¬ 
servative  protocol  exploits  1-cycle  full-lookahead.  Time  Warp  does  not.  Overheads  are  equivalent:  C\;  —  2. 
( ' s  —  2.  .V  =  t>r>53i>,  communication  patterns  are  random.  =  0.25,  //x  =  1. 


conclusions  are  supported  by  experiments  on  the  testbed  where  we  have  achieved  processor  utilizations  in 
t  he  range  of  tiO(X  —  using  32  processors  on  large  queueing  network,  logic  network,  and  cellular  automaton 
simulations.  The  performance  degradation  there  is  not  due  to  blocked  processors,  it  is  due  to  communication 
and  synchronization  overheads. 

I  he  results  reported  here  are  easily  adapted  to  simulation  models  having  only  time-lookahead.  The 
conservative  protocol  is  modified  so  that  promises  of  future  messages  are  sent  as  soon  as  possible,  and  then 
the  messages  themselves  are  sent  upon  completion  of  the  appropriate  cycle.  The  windows  are  defined  as 
before  (the  ceiling  is  the  least  next  appointment  time  to  be  sent),  but  instead  of  processing  all  events  in 
one  pass,  the  protocol  iterates  over  the  window.  Each  iteration,  computations  that  are  assured  of  no  future 
messages  (as  established  by  the  lookahead  message  times)  art'  performed.  These  create  messages  that  will 
"fret"  other  computations  that  depend  on  them  The  analysis  goes  through  as  before,  except  that  the 
increase  in  simulation  time  must  be  amortized  over  the  average  number  of  iterations  in  a  window  We  have 
not  yet  firmly  established  this  value,  but  heuristic  arguments  suggest  that  it  is  O(log.Y). 
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7  Conclusions 


1  in>  paper  proposes  and  analyzes  an  intuitive  model  of  massively  parallel  discrete-event  simulations.  We 
derive  noa-t rivial  upper  and  lower  bounds  on  optimal  performance  for  certain  classes  of  simulations,  derive 
an  tipper  bound  on  d  ime  Warp's  performance,  and  derive  a  lower  bound  on  the  performance  of  a  newly 
proposed  conservative  method.  These  results  permit  a  derivation  of  sufficient  conditions  for  the  conservative 
method  to  outperform  l  ime  Warp. 

Our  analysis  quantifies  the  dependent  of  performance  on  the  time-increment  distribution,  showing  that 
dis!  ribiitions  with  significant  constant  components  lead  to  good  performance.  We  also  determine  the  sensit iv- 
n>  of  performance  to  lookahead,  and  to  message  fan-out.  I’nfort uiiately.  our  results  rest  on  approximations 
which  are  justified  only  heurist ically.  although  there  is  excellent  agreement  between  analytic  and  empirical 
results,  f  uture  research  may  be  directed  towards  firming  up  the  foundations  of  our  approach. 

Our  results  are  significant  in  two  ways.  To  our  knowledge  this  is  the  first  analysis  able  to  analytically 
compare  the  performance  of  a  synchronization  protocol  on  a  stochastic  model  with  a  non-trivia!  hound  on 
tin-  optimal  performance  one  can  achieve.  It  is  also  significant  that  we  are  able  to  classify  simulation  models 
under  which  a  conservative  method  lias  provably  good  performance.  To  be  sure,  there  are  a  large  number  of 
simulation  models  when'  our  protocol  will  fail  miserably,  and  there  are  a  large  number  of  models  which  lack 
the  lookahead  demanded  by  our  method.  Nevertheless,  a  better  understanding  of  the  complex  behavior  of 
parallel  simulations  demands  analysis,  and  this  paper  is  an  early  effort  at  providing  that  analysis. 
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Appendix 

In  this  appendix  we  derive  a  number  of  results  too  detailed  to  include  in  the  body  of  the  paper. 


Limiting  Distribution  of  Residual 


Let  A '(<)  and  Y(t)  be  independent  renewal  processes  having  non-lattice  inter-renewal  distribution  /■'  and  G 
respectively.  Let  R\(t)  denote  the  residual  life  at  t  of  the  process  A'(/) — the  remaining  time  until  the  text 
renewal.  It  is  known  [24]  that  as  l  grows  large  R\(t)  converges  in  distribution  to  F' s  associated  rqiulhruini 
distribution  Fe: 

/OO 

Pr {/•’  >  u}/p  du 

where  p  is  F's  mean.  Now  let  Sj  be  the  time  of  the  jth  renewal  in  Y(t).  We  will  sketch  an  argument  showing 
why  the  limiting  distribution  of  Rx(Sj)  as  j  — *  oo  is  also  Fr . 

To  show  convergence  we  must  demonstrate  that  for  every  x  >  0  and  e  >  0  there  exists  a  jf  such  that  for 
all  J  >  J>  - 

\Pr{Rx(SJ)>x}-Pr{Fe>x}\<(. 


Choose  any  x  and  c.  Let  9j{t)  be  the  density  function  of  S}  (a  j-fold  convolution  of  G).  By  the  independence 
of  A'(<)  and  Y(1)  we  may  write 


Pr{/2x(5j)>*}=  ["  Fr{Rx(t)>  x}9j 
Jo 


(0  dt. 


I  Pr {Rx(Sj)  >  x}  -  Pr {Fr  >  x}[  =  [  |  Pr{/2jc(<)  >  x}-Pv{Fe  >  x}\(h 

Jo 


(t)  dt. 


Because  R\(l)  converges  in  distribution  to  Fe  as  t  oo,  we  may  choose  tf  so  large  that  for  all  f  >  tt.  the 
absolute  difference  inside  the  integral  above  is  no  greater  than  c/2.  We  may  also  choose  some  je  so  large 
that  Pr{.Vj  <  C}  <  c/2  for  all  j  >  j(.  In  t his  case  for  all  j  >  j( 

r‘- 


f  |  Pr{/?*(/)  >  x}  -  Prf/v  >  x}|^(t)  dt  <  [  g;(t)  dt  +  f  (c/2 )<j} 

Jo  Jo  Jt, 

<  c/2  +  c/2. 


(0  dt 


This  demonstrates  that  the  distribution  of  Rx(Sj)  converges  to  Ff. 


liinj_OC)  ZUi  *{K,J) 

Here  we  describe  reasonable  conditions  under  which  the  limiting  average  of  the  sequence'  {  M x :j[i)}  converges 
to  View  the  sequence  Mkj(2),  ....  as  a  discrete-time  stochastic  process  {M h\j(>)} ■  K«r  i 

large  it  is  reasonable  to  assume  that  this  process  is  stationary  in  the  wide-sense[8],  meaning  tint  there  exists 
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values  v\'  K  j  ami  (  ■£.  J  silt'll  that  L[.\1k  j(t)]  =  4 1  k  J  <  OO  ami  R'[M^j(i)J]  =  C'l  j  <  oc  for  all  /  sufliW*-t:t  ly 
large.  i  ml  that  C  'oi  [.U/v  ,j(i).  Mk  ./( j)]  is  a  function  only  of  |i  —  j|.  Furthermore,  we  expect  that  Mr  j(t) 
ami  A/;,  j(j )  .shoithl  become  independent  as  |j  —  i|  grows.  If  they  become  independent  quickly  enough,  we 
will  have 

Y  U-V1k.jU)Mk.jU)]  -  *K.J  I  <  *  (.Vi 

;  r-,a  1 

If  this  inequality  holds  true  (for  a  general  wi de-sense  stationary  process)  the  process  is  said  to  have  a  finite 
rum  Ittlnm  tnm . 

We  have  mU  developed  formal  arguments  that  {Mk.j(*)}  becomes  wide-sense  stationary  with  a  finite 
correlation  time.  However,  we  can  give  heuristic  reasons  why  it  is  reasonable  to  assume  so.  Mr  j(i)  is  the 
minimum  of  I\  random  variables,  each  comprised  of  the  sum  of  a  residual  plus  the  sum  of  J  cycle  times. 
Among  all  these  let  nmax  be  the  maximum  index  of  a  cycle  time  random  variable  (or  residual)  appearing 
in  Mrj(I )■  Now  suppose  the  time-increment  distribution  is  exponential.  Any  Mwj(j)  composed  entirely 
of  random  variables  from  cycles  with  indices  greater  than  n„,ilx  is  independent  of  Mr  j(>).  owing  to  the 
munorylcss  property  of  the  exponential.  Let  D  be  the  random  number  of  ticks  that  pass  after  i  before  e\  --tv 
I.P  has  evaluated  cycle  nmax.  Then  we  have 

-e  i+D 

Y  Y  wi'iKAnuh.j u)]-<hk,j)-\. 

J  = ;  +  1  J  =  i  +  ! 

Since  A  /  /v-  j(i)  and  Mrj(J)  have  the  same  distribution,  we  must  have  F{\!  rj(  i}M  r  j(j )]  <  f  r  I'll  ns 

I  +  O 

Y  I l-[^K.j(i)M,<.j(j)}  -  <  i:[D}  (Cl  J  -  *I  j)  <  yi 

;=•  + 1 

provided  that  I.'[D]  is  finite.  Assuming  that  a  serial  simulation  will  always  advance  a  given  finite  amount  of 
simulation  time  in  a  finite  expected  number  of  cycle  executions,  i.[D]  will  lie  finite.  I'll  is  is  true  because  a 
serial  simulation  will  always  advance  the  I.  I’  with  least  next  message  time  each  tick,  and  in  a  finite  exp<  cted 
number  of  steps  will  advance  each  I.P  at  least  once.  F.[D]  is  no  larger  than  this  expectation,  and  is  hence 
finite.  This  argument  rests  on  the  fact  that  Mr.j{i)  and  Mr  j(j)  become  independent  once  j  is  large 
enough,  The  independence  is  an  artifact  of  the  exponent  iality.  Intuitively,  if  the  tail  of  T  cannot  become  too 
large  (e.g.  if  T  is  NIH’K  or  if  it  has  an  increasing  hazard  rate  function),  it  is  reasonable  to  expect  rapidly 
diminishing  correlation  and  hence  a  finite  correlation  time.  Such  technical  details  appear  to  be  difficult  !.. 
establish. 

I  lie  result  we  want  follows  if  {A//c. ./(')}  is  wide-sense  stationary  with  a  finite  correlation  time  Let 
Mr  ;(/)  he  the  average  Mr  j  value  taken  over  the  first  i  ticks: 


V  , 

*—j  =  t 


MkjU) 
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I  :  i  .111  s'(|’  Is  1 1  we  find  I  hat 

Inn  i:[(Jh~;(i)-^(K.J))'}=0. 

I  — »  "v 

I  ins  >n  >  1 1  ( i'  >n  is  snliiiunt  strong  for  us  to  take  ^(A  , ./)  as  the  limit  ing  vaute  of  A//v  j  ( /) 


Expected  Minimum  of  Exponential  Sums 

\.  \t  u i-  derive  upper  ami  lower  bounds  on  the  expected  minimum  of  n  independent  .and  identically  rand' an 
\  iiiald'  s.  each  constructed  by  adding  two  indepeiide'iit  exponentials  which  need  not  have  the  same  m.  an 
’hi'  is  a  slight  generalization  of  an  Krlang-2  dist ribution). 

tint  approach  is  to  analyze  the  hazard  rate  function  of  a  single  exponential-sum.  We  will  construct  .  .in¬ 
s'  m  f  an  -triable  with  a  larger  hazard  rate,  and  one  with  a  smaller  one.  The  former  is  stochastically  smaller 
if  m  the  exponential-sum.  hence  the  minimum  of  u  such  is  stochastically  smaller  (linn  the  minimum  of  n 
.  au  nt  ial-snms.  Similarly,  the  minimum  of  n  independent  random  variables  that  stochastically  dominate 
m  e\p. .iiential  sum  will  stochastically  dominate  the  minimum  of  n  exponential-sums. 

I.et  As  with  A]  >  X->  be  the  exponential  parameters  for  exponentials  A]  and  As.  Ihe  hazard  rate 
fuiictii.it  A,(M  of  A]  4-  As  is  found  by  considering  a  two-stage  process  where  the  the  lirst  stage  requires  A. 
time  and  I  In  second  stage  requires  As.  Intuitively  A,(<)  is  the  instantaneous  probability  density  associated 
with  the  process  finishing  at  /.  given  that  it  has  not  yet  finished.  Condition  on  whether  A’i  <  t:  if  so.  tin  u 
\ .  i  / 1  is  \  .  because  t  he  process  is  in  the  second  st  age.  if  not  it  is  zero  because  t  he  process  litis  not  yet  finished 
i  he  first  st  age  rims  we  have 

A,(t)  =  (I  -  Pr{ A’i  >  f|A’,  +  A',  >  / ) A-_>.  Ud 

\n  .  . imvalnit  (but  much  nastier)  expression  for  A  is  derivable  from  first  principles,  taking  the  quotient,  ol 
i  h'-  density  function  of  A’i  +  A '■>  to  the  probability  that  A'i  +  AA  >  /.  Our  expression  is  more  convenient,  in 
i  hat  it  suggests  simple  ways  in  which  A ,(/)  can  be  bounded  from  above  and  below. 

An  upper  bound  on  A.,(/)  is  constructed  by  observing  that  the  conditional  probability  in  equation  (id  is 
it  least  as  large  as  I’rjA'i  >/}  =  exp{  — A,/}.  Thus 

A,(/)  <  (1  -exP{-A,/})A,. 

Iliis  latter  function  is  concave  in  I  and  is  hence  dominated  everywhere  by  the  line  tangent  to  it  at  /  0: 

h  i  /  i  —  /  \  j  A A  random  variable  with  hazard  rate  function  /i;(/)  is  therefore  st  ochast  ically  domin.ate.l  by 

A  -  i  V, 

An  low.  r  boiiiul  on  A , ( / )  is  constructed  by  exploiting  the  fact  that 


l’r{  A  i  >  /  A,  4-  V.  >  /} 


b  r  {  A  i  >  /  } 

I  ’  r  {  A  h  +  A’?  >  /  } 


< 


where  A3  has  the  distribution  of  Ad 


(  ’onset) uently. 

A-(,)  a  (‘ "  TTv) 

This  function  is  increasing  and  concave  in  /.  It  equals  At/2  w  hen  I  =  1/Aj.  Consequently,  this  function 
dominates  the  piecewise  linear  function  hu(t)  which  rises  linearly  with  slope  AjA-j/2  until  i  —  ,1/Aj,  and  then 
is  the  constant  A / 2 , 

Let  7.\ . Zn  be  n  independent  random  variables  with  hazard  rate  hu(t).  The  hazard  rate  of  the 

minimum  of  these  is  simply  n  ■  /),,(/)•  To  compute  the  expected  minimum  we  use  a  well-known  relationship 
between  a  random  variable's  hazard  rate  function  aiul  its  cumulative  distribution  function.  We  have 


A  lower  bound  is  found  similarly.  The  hazard  rate  for  the  minimum  of  n  independent  random  variables 
having  hazard  rate  function  h\(t)  is  n  hi(t).  Integrating  as  we  did  to  derive  the  upper  bound,  we  determine 
a  lower  bound  of 

<  /-.[minimum  of  n  in'  exponential-sums]  (S) 


Bounds  on  6  +  exp(//r)  Distribution 

Next  we  consider  some  bounds  on  A (A\  1)  derivable  when  the  time-increment  distribution  is  a  constant  t 
plus  an  exponential  with  mean  /tT,  and  when  the  simulation  has  1-cycle  full-lookahead. 

A(A,  1)  is  the  expected  minimum  of  I\  random  variables,  each  comprised  of  a  residual  plus  a  time- 
increment  value.  The  residual  has  the  distribution  of  the  equilibrium  distribution  of  the  h  +  oxp(/ir)  distri¬ 
bution.  We  first  consider  this  equilibrium  distribution.  Working  directly  from  definitions  [2-1],  we  determine 
I  fiat  its  hazard  rate  is 

for  t  <  6 
for  t  >  S 

Since  h(t)  >  1  / ( <5  +  //x)  for  all  t,  the  equilibrium  distribution  is  stochastically  dominated  by  an  exponential 
with  mean  6  +  Let  R,  be  a  residual  having  this  equilibrium  distribution,  Rf  be  an  exponential  with 
mean  -t-  and  A,  be  exponential  with  mean  Then  the  sum  R,  +  6  +  X,  is  stochastically  dominated 
by  R f  +  <S  +  A', ,  and 

A  mm  {/t1,  +  fi  +  A',}]  <  6  +  £’[mininium  of  K  iid  exponential-sums,  parameters  1/(6  +  fir)  and  1  ///r] 

<  ;=  i 

V  2  n  n 

R ;  stochastically  dominates  an  exponential  with  mean  pr.  Consequently  a  lower  bound  on  the  minimum 
of  interest  is  found  by  replacing  /?,  with  such  an  exponential.  Then 

£.'[  mm  {R,  +  A, }  >  6  +  £'[minimum  of  K  iid  exponential-sums,  both  parameters  l///j-  ] 

^  ^  T  Px 
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